Heavy Traffic Optimal Resource Allocation 
Algorithms for Cloud Computing Clusters 



Siva Theja Maguluri and R. Srikant Lei Ying 

Department of ECE and CSL Department of ECEE 

University of Illinois at Urbana-Champaign Arizona State University 
siva.theja@gmail.com; rsrikant@illinois.edu lying6@asu.edu 



(N 

o 

(N 
C 



o 



> 

(N 

o 

(N 



X 



Abstract — Cloud computing is emerging as an important plat- 
form for business, personal and mobile computing applications. 
In this paper, we study a stochastic model of cloud computing, 
where jobs arrive according to a stochastic process and request 
resources like CPU, memory and storage space. We consider a 
model where the resource allocation problem can be separated 
into a routing or load balancing problem and a scheduling prob- 
lem. We study the join-the-shortest-queue routing and power- 
of-two-choices routing algorithms with MaxWeight scheduling 
algorithm. It was known that these algorithms are throughput 
optimal. In this paper, we show that these algorithms are queue 
length optimal in the heavy traffic limit. 

Index Terms — Scheduling, load balancing, cloud computing, 
resource allocation. 

I. Introduction 

Cloud computing services are emerging as an important 
resource for personal as well as commercial computing appli- 
cations. Several cloud computing systems are now commer- 
cially available, including Amazon EC2 system [7|, Google's 
AppEngine 0J, and Microsoft's Azure [3|. A comprehensive 
survey on cloud computing can be found in 0, ifTTl . 

In this paper, we focus on cloud computing platforms that 
provide infrastructure as service. Users submit requests for 
resources in the form of virtual machines (VMs). Each request 
specifies the amount of resources it needs in terms of processor 
power, memory, storage space, etc.. We call these requests 
jobs. The cloud service provider first queues these requests 
and then schedules them on physical machines called servers. 

Each server has a limited amount of resources of each kind. 
This limits the number and types of jobs that can be scheduled 
on a server. The set of jobs of each type that can be scheduled 
simultaneously at a server is called a configuration. The convex 
hull of the possible configurations at a server is the capacity 
region of the server. The total capacity region of the cloud is 
then the Minkowski sum of the capacity regions of all servers. 

The simplest architecture for serving the jobs is to queue 
them at a central location. In each time slot, a central scheduler 
chooses the configuration at each server and allocates jobs 
to the servers, in a preemptive manner. As pointed out in 
03], this problem is then identical to scheduling in an ad 
hoc wireless network with interference constraints. In practice, 
however, jobs are routed to servers upon arrival. Thus, queues 
are maintained at each individual server. It was shown in lfj"5l 
that join-the-shortest queue-type algorithms for routing, along 



with the MaxWeight scheduling algorithm [22] at each server 
is throughput optimal. The focus of this paper is to study the 
delay, or equivalently, the queue length performance of the 
algorithms presented in lfl5l . 

Characterizing the exact delay or queue length in general is 
difficult. So, we study the system in the heavy-traffic regime, 
i.e., when the exogenous arrival rate is close to the boundary 
of the capacity region. In this regime, for some systems, 
the multi-dimensional state of the system reduces to a single 
dimension, called state-space collapse. In ifTBI . Il23ll . a method 
was outlined to use the state-space collapse for studying the 
diffusion limits of several queuing systems. This procedure 
has been successfully applied to a variety of multiqueue 
models served by multiple servers ||201 . IfTTl . iflZl . [@], But 
these models assume that the system is work conserving, i.e., 
queued jobs are processed at maximum rate by each server. 
Stolyar 12T1 . generalized this notion of state-space collapse 
and resource pooling to a generalized switch model, where 
it is hard to define work-conserving policies. This was used 
to establish the heavy traffic optimality of the MaxWeight 
algorithm. 

Most of these results are based on considering a scaled ver- 
sion of queue lengths and time, which converges to a regulated 
Brownian motion, and then show sample-path optimality in 
the scaled time over a finite time interval. This then allows 
a natural conjecture about steady state distribution. In fl8], 
the authors present an alternate method to prove heavy traffic 
optimality that is not only simpler, but shows heavy traffic 
optimality in unsealed time. In addition, this method directly 
obtains heavy-traffic optimality in steady state. The method 
consists of the following three steps. 

(1) Lower bound: First a lower bound is obtained on the 
weighted sum of expected queue lengths by comparing 
with a single-server queue. A lower bound for the single- 
server queue, similar to the Kingman bound [14], then 
gives a lower bound to the original system. 

(2) State-space collapse: The second step is to show that 
the state of the system collapses to a single dimension. 
Here, it is not a complete state-space collapse, as in the 
Brownian limit approach, but an approximate one. In 
particular, this step is to show that the queue length along 
a certain direction increases as the exogenous arrival rate 
gets closer to the boundary of the capacity region but the 



queue length in any perpendicular direction is bounded. 
(3) Upper bound: The state-space collapse is then used to 
obtain an upper bound on the weighted queue length. 
This is obtained by using a natural Lyapunov function 
suggested by the resource pooling. Heavy-traffic opti- 
mality can be obtained if the lower bounds and the upper 
bounds coincide. 

In this paper, we apply the above three-step procedure to 
study the resource allocation algorithms presented in [15|. We 
briefly review the results in |[l"5l now. Jobs are first routed to 
the servers, and are then queued at the servers, and a scheduler 
schedules jobs at each server. So, we need an algorithm that 
has two components, viz., 

1) a routing algorithm that routes new jobs to servers in 
each time slot (we assume that the jobs are assigned to 
a server upon arrival and they cannot be moved to a 
different server) and 

2) a scheduling algorithm that chooses the configuration of 
each server, i.e., in each time slot, it decides which jobs 
to serve. Here we assume that jobs can be preempted, 
i.e., a job can be served in a time slot, and then be 
preempted if it is not scheduled in the next time slot. Its 
service can be resumed in the next time it is scheduled. 
Such a model is applicable in situations where job sizes 
are typically large. 

It was shown in lfl5l that using the join-the-shortest- 
queue (JSQ) routing and MaxWeight scheduling algorithm is 
throughput optimal. In Section [Till we show that this policy 
is queue length optimal in the heavy traffic limit when all the 
servers are identical. We use the three step procedure described 
above to prove the heavy traffic optimality. The lower bound 
in this case is identical to the case of the MaxWeight schedul- 
ing problem. However, state-space collapse does not directly 
follow from the corresponding results for the MaxWeight 
algorithm in J8) due to the additional routing step here. We 
use this to obtain an upper bound that coincides with the lower 
bound in the heavy traffic limit. 

JSQ needs queue length information of all servers at the 
router. In practice, this communication overhead can be quite 
significant when the number of servers is large. An alternative 
algorithm is the power-of-two-choices routing algorithm. In 
each time slot, two servers are chosen uniformly at random and 
new arrivals are routed to the server with the shorter queue. 
It was shown in lfl5ll that the power-of-two-choices routing 
algorithm with the MaxWeight scheduling is throughput op- 
timal if all the servers are identical. Here, we show that the 
heavy-traffic optimality in this case is a minor modification 
of the corresponding result for JSQ routing and MaxWeight 
scheduling. 

A special case of the resource allocation problem is when 
all the jobs are of same type. In this case, scheduling is not 
required at each server. The problem reduces to a routing- 
only problem which is well studied ED, 0, (6), ED, fl9l . 
For reasons to be explained later, the results, from Section 
ITTT1 cannot be applied in this case since the capacity region 



is along a single dimension (of the form A < /i). In Section 
IIVI we show heavy traffic optimality of the power-of-two- 
choices routing algorithm. The lower and upper bounds in 
this case are identical to the case of JSQ routing in JS]. 
The main contribution here is to show state-space collapse, 
which is somewhat different compared to (8). The results here 
complement the heavy-traffic optimality results in @, |[T3l 
which were obtained using Brownian motion limits. 

Note on Notation 

The set of real numbers, the set of non-negative real 
numbers, 

and the set of positive real numbers are denoted by R, M + 
and R++ respectively. We denote vectors in K J or R M by 
x, in normal font. We use bold font x to denote vectors in 
R ,7M . Dot product in the vector spaces R J or R M is denoted 
by (x, y) and the dot product in R JM is denoted by (x, y) . 

II. System Model and Algorithm 

Consider a discrete time cloud computing system as follows. 
There are M servers indexed by m. Each server has / different 
kinds of resources such as processing power, disk space, 
memory, etc.. Server m has Ri, m units of resource i for 
i G {1, 2, 3, /}. There are J different types of jobs indexed 
by j. Jobs of type j need rij units of resource i for their 
service. A job is said to be of size D if it takes D units of 
time to finish its service. Let D max be the maximum allowed 
service time. 

Let Aj(t) denote the set of type-j jobs that arrive at the 
beginning of time slot t. Indexing the jobs in Aj(t) from 1 
through we define aj(t) = J2keA (t) Dk, to be the 

overall size of the jobs in Aj (t) or the total time slots requested 
by the jobs in Aj(t). Thus, a,j(t) denotes the total work load 
of type j that arrives in time slot t. We assume that aj (t) is a 
stochastic process which is i.i.d. across time slots, E[o,-(t)] = 
Xj and Pr(aj(t) = 0) > ca for some €a > for all j and t. 
Many of these assumptions can be relaxed, but we make these 
assumptions for the ease of exposition. Second moments of the 
arrival processes are assumed to be bounded. Let var[cij(t)] = 
dj, A = (Ai,....Aj) and a — (<ji, ...,crj). We denote a 2 = 
(*l....aj). 

In each time slot, the central router routes the new arrivals 
to one of the servers. Each server maintains J queues corre- 
sponding to the work loads of the J different types of jobs. 
Let qj. m {t) denote the total backlogged job size of the type j 
jobs at server m at time slot t. 

Consider server m. We say that server m is in configuration 
s = (si, S2, sj) € if the server is serving s% jobs of 

type 1, S2 jobs of type 2 etc. This is possible only if the server 

has enough resources to accommodate all these jobs. In other 

J 

words, V .-•,/•<., < Ri.mSi € {1,2,...,/}. Let s max be the 
i=i 

maximum number of jobs of any type that can be scheduled 
on any server. Let S m be the set of feasible configurations 
on server m. We say that s is a maximal configuration if 
no other job can be accommodated i.e., for every j' s + ey 



(where ey is the unit vector along f) violates at least one 
of the resource constraints. Let C* be the convex hull of the 
maximal configurations of server m. Let C m ={s£ : 
s < s* for some s* € C* n }. Here s < s* means sj < s*\/j E 
{1, 2, J}. C m can be thought of as the capacity region for 
server m. Note that if A E interior(C m ), there exists an e > 
such that A(l + e) E C m . C m is a convex polytope in the 
nonnegative quadrant of R J . 

M 

Define C = *£C m — {s E (R+) J : 3s m E 



m—l 

M 



M 



C m V m s.t. s < s ™}- We denote this as C = Cm- 

m— 1 m=l 

Here s m just denotes an element in C m and not m th power of 

M 

s. Then, C = E Cm, where E denotes the Minkowski sum 

m— 1 

of sets. So, C is again a convex polytope in the nonnegative 
quadrant of R J . So, C can be described by a set of hyperplanes 
as follows: 

C = {s > : (c (k \s^ < b {k \k = 1,...K} 

where K is the number of hyperplanes that completely defines 
C, and (p k > , b^) completely defines the k th hyperplane %^ k \ 
(c^ k \ s) = b^ k \ Since C is in the first quadrant, we have 

|| c (*0||=i , c (fc) >0, b {k) >0 for k = 1,2, ...K. 

It was shown in [15| that C is the capacity region of this 

M 

system. Similar to C, define S = Y ^m- 

m—l 

Lemma 1: Given the k th hyperplane of the capacity 
region C (i.e., (c- k > , A) = b^), for each server m, there is a 
b^} such that (c' fc \ A) = b^} is the boundary of the capacity 

M 

region C m , and = ^ b) k) . Moreover, for every set 

m—l 

{\ { m } E Cm] such that A( fe > = ^ *m and A( fc ) e C lies 

L J m m=1 

on the k th hyperplane H {k) , we have that (c<- k \ A^ } ^) = 
Proof: Define b^m = max (c^ fc \s). Then, since 

M m M 

C = E c ™, we have that M fc ) = ^ 6fc • 

m—l m=l 

Again, by the definition of C, for every A E C, there 

M 

are Am** E C m for each m such that A'' -' = E ^m' 1 - 

m—l 

However, these may not be unique. We will prove that for 
^ {k) \ for each m, (cW,A^\ = b { r k) . Sup- 



every such j A m f 

L J m \ / 

pose, for some server mi, /c^ k \\ml) < Then 



c^ fe \ E ^m'' ) = E ^m''' there exists iri2 such that 

m—l I m—l 

c( fc ) , Ams ) > bin] which is a contradiction. Thus, we have 
the lemma. ■ 

III. JSQ Routing and MaxWeight 



ifc) 



Scheduling 

In this section, we will study the performance of JSQ 
routing with MaxWeight scheduling, as described in Algorithm 
ffl 

Algorithm 1 JSQ Routing and MaxWeight Scheduling 

1) Routing Algorithm: All the type j arrivals in a time slot 
are routed to the server with the smallest queue length 
for type j jobs, i.e., the server m* — argmin qj, m - 

mG{l,2,...A/} 

Ties are broken uniformly at random. 

2) Scheduling Algorithm: In each time slot, server m 

chooses a configuration s m E C* n so that s m = 
J 

argmaxE s ™<7j,m- It then schedules up to a maximum 

of sj 1 jobs of type j (in a preemptive manner). Note 
that even if the queue length is greater than the allocated 
service, all of it may not be utilized, e.g., when the back- 
logged size is from a single job, since different chunks 
of the same job cannot be scheduled simultaneously. 
Denote the actual number of jobs chosen by sj 1 . Note 
that if q j: , n > D m axS m ax, then 7f = sf. 



Let Yj t m(t) denote the state of the queue for type-j jobs 
at server m, where YJ m (t) is the (backlogged) size of the 
i th type-j job at server m. It is easy to see that Y(t) = 
{Yj,m(t)}j,m is a Markov chain under the JSQ routing and 
MaxWeight scheduling. Then, qj, m {t) = Yli^jmi^) ^ s a 
function of the state Y^ m (i). 

The queue lengths of workload evolve according to the 
following equation: 

Qj,m(t + 1) = qj, m (t) + aj,m{t) - sf(t) 

= Qj,m(t) + 0,j,m{t) - sf(t) + Uj,m(t) (1) 

where Uj t7n (t) is the unused service, given by Uj. m (t) = 
lf{t) - sf (t), sf(t) is the MaxWeight schedule and sf{t) 
is the actual schedule chosen by the scheduling algorithm and 
the arrivals are 



*(*) 




if m = rrij (t) 
otherwise 



(2) 



Here, m* is the server chosen by the routing algorithm for 
type j jobs. Note that 

Uj,m(t) = when q jim (t) + a,j >m (t) > D ma xSmax- (3) 
Also, denote s = (sj)j where 



M 



(4) 



Denote a = (a jjm ) jtm , s = (s™)^™ and u = (%, m )j, m . Also 
denote 1 to be the vector with 1 in all components. 

It was shown in lTT5l that this algorithm is throughput 
optimal. Here, we will show that this algorithm is heavy traffic 
optimal. 



Recall that the capacity region is bounded by K hyper- 
planes, each hyperplane %( k > described by its normal vector 
and the value b^ k \ Then, for any A € interior(C), we 
can define the distance of A to HV°) and the closest point, 
respectively, as 



min ||A 

s6«< fc > 



(5) 



X (k) =A + e (fe) c W 



where > for each k since A G interior(C). We let e = 
(c^^j.—! denote the vector of distances to all hyperplanes. 
Note that A( fc ) may be outside the capacity region C for some 
hyperplanes. So define 



IC; 



4 |fc G {1,2, ...Jf} iA^gC} 



JC\ identifies the set of dominant hyperplanes whose closest 
point to A is on the boundary of the capacity region C 
hence is a feasible average rate for service. Note that for any 
A G interior(C), the set K.\ is non-empty, and hence is well- 
defined. We further define 

JC° X = [k G JC X : A (fc) G Relint{F {k) )} 

where denotes the face on which lies and Relint 
means relative interior. Thus, fC° x is the subset of faces in K,\ 
for which the projection of A is not shared by more than one 
hyperplane. 

;( fc )), , > 0, let A^ be the arrival rate in the 



For e 



>k=i 



interior of the capacity region so that its distance from the 
hyperplane HS k ^ is e^ k \ Let A' fe ^ be the closest point to A' e J 
on T-LS k K Thus, we have 



A« = A« + 



e (fe) c (fe). 



(6) 



Let cj}- e '(t) be the queue length process when the arrival rate 

is A< e) . 

Define c( fe ) G R+ M , indexed by j,m as = -j=- We 
expect that the state space collapse occurs along the direction 
c ( fc ) xhis is intuitive. For a fixed j, JSQ routing tries to 
equalize the queue lengths across servers. For a fixed server 
m, we expect that the state space collapse occurs along c^ k ' 
when approaching the hyperplane T-L^ k \ as shown in [8]. Thus, 
for JSQ routing and Max Weight, we expect that the state space 
collapse occurs along in R JM . 

For each k £ I^ x m , define the projection and perpendicular 
component of q( e ' to the vector c( fc ) as follows: 



qff fc >4/c<*>,q«W , ° 



(e.k) A ( e ~i 



In this section, we will prove the following proposition. 

Proposition 1: Consider the cloud computing system de- 
scribed in Section HIl Assume all the servers are identical, i.e., 
Ri,m — Ri for all servers m and resources i and that JSQ 
routing and Max Weight scheduling as described in Algorithm 
[T]is used. Let the exogenous arrival rate be A^ e ' G Interior(C) 
and the standard deviation of the arrival vector be G . 



where the parameter e = (e^ 



'fe=i 



is so that e' fe ' is the 



distance of A^ from the k th hyperplane H^ k ' as defined in 
(|5). Then for each k G ^C^ (s) , the steady state queue length 
satisfies 



5«E 



,q(t) 



< 



D, 



(e.fe) 



A((^») 2 .(^) 2 ) + ^ 



B<- k| is 



where £( e > fc ) 

In the heavy traffic limit as e( fe ) I 0, this bound is tight, i.e., 

" ~2~ 



lim e (fe) l 



where C « = ^ ( (c«) 2 , (af 

We will prove this proposition by following the three step 
procedure described in Section [Q by first obtaining a lower 
bound, then showing state space collapse and finally using the 
state space collapse result to obtain an upper bound. 

A. Lower Bound 

Since A^ e ' is in the interior of C, the process {q( e )(i)| has 
a steady state distribution. We will obtain a lower bound on 

r j c <k) ( m 

j— 1 \m— 1 

follows. 

Consider the single server queuing system, 4>^\t) with 
arrival process ^ k \ {t)) and service process given 

by at each time slot. Then cj>(t) is stochastically smaller 
than (c( fe ), q(i)' e )). Thus, we have 



E[(« 



(fc),q(<0 



in steady state as 



E 



.(fc) „(<0 



> E 



as follows [8 1 



> 



£(e,fc) 



( e ,fc) 



Using c/) 2 as Lyapunov function for the single server queue 
and noting that the drift of it should be zero in steady state, 
one can bound E 

e «E 
where (c( fe )) = 

c (e < fc) = ^(( c (fc) ) 



J 

3=1 



B 



(6,fe) 



and 



( e ( fc )) 2 



Thus, in the heavy traffic limit as e^ 5 ) J, 0, we have that 



lim e (fe) I 



c « q(e) 



> 



(7) 



where = ^= 



B. State Space Collapse 

In this subsection, we will show that there is a state space 
collapse along the direction c( fc ). We know that as the arrival 
rate approaches the boundary of the capacity region, i.e., 
e ( k ) o, the steady state mean queue length E[||q||] — > oo. 
We will show that as e' fe ) — > 0, queue length projected along 



any direction perpendicular to c( fe ) is bounded. So the constant 
does not contribute to the first order term in in which we 
are interested. Therefore, it is sufficient to study a bound on 
the queue length along c^ fe ' . This is called state-space collapse. 
Define the following Lyapunov functions. 

M J 



(fc) 

Lemma 3: Drift of W]_ can be bounded as follows: 



A^| fe) (q)<-4^(Anq)-A^[ fe) (q)) Vqeij 



(fc) 

q± 



(8) 



f(q) 



wf } (q) 



m— lj — 1 



(fc) 

q± 



. ^i fe) (q) 



q n 



V[f fc) (q) 4 /,•'*' 



(fe) 
^11 



1 (f V 

m— lj — 1 



Define the drift of the above Lyapunov functions. 
AV(q) 4 [F(q(i + 1)) - V(q(t))] X(q(t) = q) 



^i fc) (q(i + l))-iy| fc) (q(t)) 
W | f ) (q(t + 1))-W | f ) (q(t)) 



J(q(t) = q) 
I(q(t) = q) 



A^f(q) 
A^ fe) (q) 
A^f fc) (q) 

To show the state space collapse happens along the direction 
of c( fc ), we will need a result by Hajek [10], which gives a 



V[f fc) (q(t + 1))-V[p(q(t))j X(q(t) = q) 



bound on 
we use the 



(fc) 

q± 



if the drift of wj^(q) is negative. Here 
following special case of the result by Hajek, as 
presented in |[8l . 

Lemma 2: For an irreducible and aperiodic Markov Chain 
{Jf [i]} t > over a countable state space X, suppose Z : X — >• 
R + is a nonnegative-valued Lyapunov function. We define the 
drift of Z at X as 

AZpO ^ [ Z (X[t + 1]) - Z(X[t])] l(X[t] = X), 

where X(.) is the indicator function. Thus, AZ(X) is a random 
variable that measures the amount of change in the value of 
Z in one step, starting from state X. This drift is assumed to 
satisfy the following conditions: 

1) There exists an T) > 0, and a k < oo such that for all 
X G X with Z{X) > k, 

E[AZ(X)\X[t] =X}< -r,. 

2) There exists a D < oo such that for all X G A", 

P(|AZ(X)| < D) = 1. 
Then, there exists a (9* > and a C* < oo such that 



lim sup 1 



,e*z(x[i\) 



< C*. 



If we further assume that the Markov Chain {JT[i]}t is positive 
recurrent, then Z(X\t\) converges in distribution to a random 
variable Z for which 



E 



r z 



which directly implies that all moments of Z exist and are 
finite. 

We also need Lemma 7 from [8|, which gives the drift of 
W| fc) (q) in terms of drifts of V(q) and V|[ fe '(q). 



Let us first consider the last term in this inequality. 



E 

=E 

=E 



=E 



A^[ fe) (q (e )) q( £ )(t) = q( £ ) 
V { \ k \ q ^(t + 1)) - V[(*>(q< e) (*)) q M(t) = q( £ ) 



(c( fc ),qW(t + l)) -(c« q (£) W 
c « , (t) + a< £ > (t) - gW (t) + (*) 



q(t) = q« 

2 



=E 



q(t) - q< e > 

c( fc W°(*) +a( £ )(i) -S^(t)Y + <JcW,uW(t)V 
+ 2<JcW,q^(t) + a< £) (0 -s^(t)\ <^ C ( fc >, u (e) (<)^) 



c<*>,qW(t) 



q(t) = q( £ > 

2 



>E 



(c« , a« (t) - s« (t)) - 2 (c« , s< £ ) (t) ) (c« , u< e > (t) ) 
+2 /c<*> , q« (t) ) <V fe > , a' 6 ' (t) - s< e > (i) J) q(i) = qW 
>2 /c^.qW) ((cW.E [a«(t)| q(t) - q (e) ] 

w (*)|q(*)=q (e) ]))-2(cW,wl) 2 



2||q 



( e .fc)ii J 



M 



I J / M 



7—1 \m= 1 



x 2 



^E[ S ;^W|q(i)=q 

M 

■$>[#< e >(t)|q(t)=q 



■ 2||q ''' fc) "-E c i( A ? ) - eWc f 



-#2 

(9) 



71/ 



3=1 



M 

■£>[#M(i)|q(f)= q M 



2||q 



( e .fc)ii J 



M 



j=l \m=l 



'•(fc) 



A/ 

£E[ ai m «(t)|q(t)=qW 



(10) 



7W l|q ii 1 



(11) 



o|| n ( e 'fc)|| M J 



m— lj — 1 



#«(t)|q(t)=qM]) 



w 2e(fe) M (^ fc )| 

>-* 2 -^l| q ^ 



<M 



(12) 



where Kq, = 2JMs ma x- Equation (O follows from the fact 
that the sum of arrival rates at each server is same as the 
external arrival rate. Equation (TITTi follows from d§}. From the 

definition of C, we have that there exists £ C m such that 

M 

\( k ) = J2 X m{k) . This gives CED- Fr om Lemma [TJ we have 

m—l 

that for each to, there exists bm) such that c j X" 1 ^ = b^m 

3=1 

and (c( k \s m ^) < b [ m ] for every s m ^(t) £ C m . Therefore, 
we have, for each m, 



j 
\ V 



cjh i x T (k) - E 



a ^(t)|q(t) = q(' 



> 



3=1 

and so (fT2l is true. 

Now, let us consider the first term in (|8). By expanding the 
drift of V(q( e )) and using (0, it can be easily seen that 



Ai/(q (e) )|q (e) W-q 



0) 



<K' + E 



M J 

EE 

m— lj— 1 
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where K' = M [Y, + 0*) + 2Js TO(ra (l + D„ 



By definition of a,j. m (t), (O we have 
E 



M .1 
m— 1 j — 1 



E 2 Ci;^(*) 

(0 \W 



3=1 
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M (e) 



- i As M 

j — 1 m—l 

From (fnt and (fPfl t. we have, 
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<^+E2AfE^- 2 E ] 
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E<&r(*) 
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(16) 



where iTi = X' + 2JMD max s max . Equation ( fTSl is true 
because of MaxWeight scheduling. Note that in algorithm [TJ 
the actual service allocated to jobs of type j at server to 
is same as that of the MaxWeight schedule as long as the 
corresponding queue length is greater than D max s max . This 
gives the additional 2JMD ma xS max term. 

Assuming all the servers are identical, we have that for 
each to, C m = {\/M : A e C}. So, C m is a scaled version 
of C. Thus, A m = A/M. Since fc e /C° (e) , we also have that 
k € ^-?m(e) f° r me capacity region C m . Thus, there exists 
<jW > so that 

fiW 4 H W n{reM{: ||r - A^/Mll < 0< fc >} 

lies strictly within the face of C m that corresponds to J 7 ^. 
(Note that this is the only instance in the proof of Proposition 
[TJthat we use the assumption that all the servers are identical.) 



Call this face J 7 , 
E 



Thus we have, 



[A^(q (£) )|q (e) (*) = q (e) 
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Af 
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Vo (£ ' fe) 
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M 

= -2*wf;. 
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=i\ 3=1 



< - 25 (fe) , 



M J 



(19) 



(20) 



A m=lj'=l 

= -2<5W|qf|. 

Equation ( fT8l is true because c is a vector perpendicular to the 
face J-"m of C m whereas both A^ fc ' /M and r m lie on the face 



(21) 



jF [ m ] . SO, 



1 



3=1 



M 



is true because £ g^'^ ( ^ r 



3=1 



0. Equation ( fT9l ) 



is inner product in Mi 



which is minimized when r m is chosen to be on the boundary 

(k) { >- {k) \ 

of B\,L so that -4-j r™ I points in the opposite direction 



to 



Since 



M J ( (e fe) 
m— It — 1 v 



m— 1 y J= 1 / m—lj — 

Now substituting (IT/l and (|2TT > in ([§), we get 



we g' 



et (EQj. 
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A W4 fe) (q (e) )|q (e) (*) =q (e) 



(e,fc) 

qi 



<- 



whenever (q' 



> 



Moreover, since the departures in each time slot are bounded 
and the arrivals are finite there is a D < 00 such that 
P(|AZ(AT)| < D) almost surely. Now, applying Lemma [2] 
we have the following proposition. 

Proposition 2: Assuming all the servers are identical, for 
A( e ) G C, under JSQ routing and MaxWeight scheduling, 
for every k G there exists a set of finite constants 

{M k) } r=ia ,... such that E 



(6,fc) 

q± 



< iV r (fc) for all e > 



and for each r = 1,2, 

As in ETI . ||8), note that k G ^a<o * s an i m P ortant 
assumption here. If k G K \ ^?( e )> ^ e- ' ^ ^ e arr i va l rate 
approaches a corner point of the capacity region as —> 0, 
then there is no constant 5^ so that Bi^ lies in the face 
T^ k \ In other words, the 5^ depends on g( fe ) and so the 
bound obtained by Lemma [2] also depends on . 

Remark: As stated in Proposition Q] our results hold only 
for the case of identical servers, which is the most practical 
scenario. However, we have written the proofs more generally 
whenever we can so that it is clear where we need the identical 
server assumption. In particular, in this subsection, up to 
Equation ( TTol l. we do not need this assumption, but we have 
used the assumption after that, in analyzing the drift of V(q). 
The upper bound in the next section is valid more generally 
if one can establish state-space collapse for the non-identical 
server case. However, at this time, this is an open problem. 

C. Upper Bound 

In this section, we will obtain an upper bound on the 
weighted queue length, E [(c( k >, q' 6 -*)] in steady state, and 

as 



show that in the asymptotic limit as 
with the lower bound. 



I 0, this coincides 



Noting that the drift of AM / |'j' k '' is zero in steady state, it 



Let tt^ be the steady-state probability that the MaxWeight 
schedule chosen is from the face F^ k \ i.e., 



tt« =p((c,s(t)) =6 (fe) ) 



M 

where Sj — ^ s™ 1 as defined in (0). Also, define 

m— 1 

7 (fc) = min {V fc) - (c, r) : r G S \ F {k) } 
Then noting that in steady state, 



E 



it can be shown as in Claim 1 in [8] that for for any e( fc ) G 
(0,7 (fc) ), 



Then, note that 



E 



(V*Mc^))) : 



= (l-7r«)E 



Define C m C R^ 1 as C 



(c,s m ail) 2 j (25) 
C\ x ... x Cm- Then, C m is a 



convex polygon. 

Claim 1: Let q m G R{ for each m G {1,2, ....M}. Denote 
q = {l m )m = \ G R+ M - If, for each to, (s m )* is a solution of 
max (q m , s) then s* = ((s m )*) m is a solution of max (q, s). 

< max (q, s) . 

sGC m 

Therefore, 



Proof: Since s* G C m , (q, s* 



M 



Note that max (q, s) = ^ max (q m ,s n 

M 

if (q,s*) < max(q,s), we have J2 (<7"\ < 
seC m m=i 

M 

max (g m , s m ). Then there exists an to < M such that 
^g m ,(s" l )*)< max (q m , s m ), which is a contradiction. ■ 

Therefore, choosing a MaxWeight schedule at each server 
is same as choosing a MaxWeight schedule from the con- 
vex polygon, C m . Since there are a finite number of fea- 



can be shown, as in Lemma 
for any c G we have 



from that in steady state, sible schedules, given c<» G 



such that llc^l 



1, there exists an angle 9^ G (0, \ ] such that, for all 
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<c,q(t) 



■a(t)-s(t)) (c,u(t))] 



(22) 

(23) 
(24) 



q e qe 



JM 



(fe)| 



> ||q|| cos 



(#*>)}, (i.e., 



for all 



q G R+ M such that 6* (fc) < where 9 ab represents the 



qq 



We will obtain an upper bound on E [(c( k \ q*- 6 -*)] by bound- 
ing each of the above terms. Before that, we need the following 
definitions and results. 



angle between vectors a and b), we have 

( C ( fc ),sW)z(q(t)=q) = 6( fe )/^. 
We can bound the unused service as follows. 

E [(c( fe ),u(<)\l < E [(cW.sft) - a(<) V 



M 
1 
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>M 
e (fe) 



>M 



(E [(cW,^))]-(c«,A e )) 

(fe) ,s(i))] -(&<*> - e «)) 
(26) 



E 



where the last inequality follows from the fact that the 
MaxWeight schedule lies inside the capacity region and so 
E [(cS k \s(t))] < b^l 

Now, we will bound each of the terms in (l24l i. Let us first 
consider the term in (122V Given that the arrival rate if A 6 we 
have, 
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=E 

-E 

e (k) 



c« q(t))(c« s(t)-a(t) 
bW l 



.(fe) 
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c W ,q(«) 
c (fc) ,qW 



M VM 



(fe) 
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qf* } (*)ll 



&(*) 



c (fc) ,s(t) 



Now, we will bound the last term in this equation using the 
definition of 6^ as follows. 
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=E 
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x cot [ 6 {k) 
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E 

M 
cot(6»( fe )) 



qf (t)||(6W-( C W, S (t)))]cot(^)) (28) 



(29) 



/ill 



^2 (fc)1 W ((&W) 2 + < C ,Wl) 2 ) 



where (|27T i follows from the definition of 9^ k \ (l28l follows 
from our choice of c^ fc ' and definition of s, (1291 follows from 



Cauchy-Schwarz inequality, the last inequality follows from 
state-space collapse (Proposition |2]i and ( f25l l. Thus, we have 
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c< fc >, q(t))(cW S(t)-a(t) 



( fe ) - 



e (fe) 
>^E 



q(*) 
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Now, consider the first term in 
that the arrival rate is A 6 we have. 



Again, using the fact 
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(b^y +( c , w i) ; 



(32) 



where £( e ' fe ) was defined as £( e ' fc ) 



_ (e«=>) 2 



/A/ 



737 



cW) , fcrW) \. Equation ( 1311 ) is obtained by 



noting that E [a(t)] = A c and so E ((c {k \a(t) - X e ))' 

var((c( k \a(t) - A £ )) = (c<- k \ var(a(t) - A e )). 
Consider the second term in 
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C<*> Is 



(33) 



where the last inequality follows from 

Now, we consider the term in (l24t . We need some def- 
initions so that we can only consider the non-zero compo- 
nents of c. Let £ { +l = U £ {1,2,. ..J} : cf ] > oj. Define 



( C S) w .q = feml^W and u = ( Ujm ) 



Also define, the projections, qf| = (c( fc ) , q) c( fc ) and q^ 
q — q| I ■ Similarly, define u.| , and u j_ . Then, we have 
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=E 

=E 

=E 
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E [(1, u(t))] + V jV< fc) E[(S(t),g(0)] 
E [<1, u(t))] + ^JVfwEKl.Sft))] 



where ( [34-b follows from @ and from Cauchy-Schwarz in- 
equality. Equation (|35l l follows from from state-space collapse 



(Proposition |2), since 
Note that 
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(A) A (k) 

where c ' in = min c) ■ > and the last inequality follows 

i"<F£ <fc) 
JfcI -++ 



from (l26l i. Thus, we have 
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(l32i ([33j and |36]i in (|24l . we get 
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+ cot (V fe >) JiVf)l^(( & W) 2 + ( C)W l) 2 ). 



Thus, in the heavy traffic limit as g( fe ) \. 0, we have that 



lim e (fc) E 



C W,q( £ ) 



< 



(37) 



where C (k) was defined as C (fc) = -j= ((c (fc) ) , (cr) 2 ^. Thus, 
(0 and ( f3Tb establish the first moment heavy-traffic optimality 
of JSQ routing and MaxWeight scheduling policy. The proof 
of Proposition Q] is now complete. 

D. Power-of-Two-Choices Routing and MaxWeight Scheduling 

JSQ routing needs complete queue length information at 
the router. In practice, this communication overhead can be 
considerable when the number of servers is large. An alternate 
algorithm is the power-of-two-choices routing algorithm. 

In this algorithm, in each time slot t, for each type of job to, 
two servers to{ (t) and m 3 2 (t) are chosen uniformly at random. 
All the type m job arrivals in this time slot are then routed 
to the server with the shorter queue length among these two, 



i.e., m 



(t) = 



argmm qj t . 

m&^m 3 , (t), mj(t)} 

It was shown in 11151 that power-of-two-choices routing al- 
gorithm with MaxWeight scheduling is throughput optimal if 
all the servers are identical. From the proof of throughput 
optimality, one obtains 



[a 7(q (e) )|q (e) (*) 



<K' 



M J 

EE 2 ". 



(e) w 
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it) 



J M 

E 2 ^E%- E 

Note that this inequality is identical to ( fT5l ), in the proof of 
state-space collapse of JSQ routing and MaxWeight scheduling 
policy. Also note that the remainder of the proof of state- 
space collapse and upper bound in Sections IIII-BI and IIII-CI 
is independent of the routing policy. Moreover, the proof of 
lower bound in Section ITlI-AI is also valid here. Thus, once we 
have the above relation, the proof of heavy traffic optimality of 
this policy is identical to that of JSQ routing and MaxWeight 
scheduling policy. 

IV. Power-of-Two-Choices routing 

In this section, we consider the power-of-two-choices rout- 
ing algorithm, without any scheduling. This is a special case 
of the model considered in the previous section when all the 
jobs are of the same type. In this case, there is a single queue 
at each server and no scheduling is needed. 

Note on Notation 

In this section, since J = 1 here, we just denote all vectors 
( in R M ) in bold font x. 

The result from previous section is not applicable here be- 
cause of the following reason. In Proposition Q] a sequence of 
systems with arrival rate approaching a face of the capacity 
region, along its normal vector were considered. The normal 



vector of the face plays an important role in the state space 
collapse, and so the upper bound obtained is in terms of this 
normal. So, this result cannot be applied if the arrival rates 
were approaching a corner point where there is no common 
normal vector. In particular, the proof of state space collapse in 
Section IHI-BI is not applicable here because one cannot define 
a ball B?2) as m G3 at a corner point. 

Let A(t) denote the set of jobs that arrive at the beginning of 
time slot t. Let be the size of k th job. We define a(t) = 
^2keA(t) Dk, to be the overall size of the jobs in A[t) or 
the total time slots requested by the jobs. We assume that 
a(t) is a stochastic process which is i.i.d. across time slots, 
E[a(i)] = A and Pr(a(t) = 0) > e a for some e a > for 
all t. Let er 2 = var[a(t)]. Let X(t) denote the servers chosen 
at time slot t. So, X(t) can take one of M C*2 values of the 
form (m,m') where to, to' G Z+ and 1 < to < to' < M. 
Here M C2 denotes the number of 2-combinations in a set 
of size M. Note that X{t) is an i.i.d. random process with 
a uniform distribution over all possible values. Define M Ci 
different arrival processes denoted by a m m * (t) with 1 < to < 
to' < M as follows. If x(t) = (to, to'), then 



of Q along ci, i.e., Qn = (Q,ci) ci where (., .) denotes the 

canonical dot product. Thus, Q|| = "' M 1. Define Qj^ to be 
the component of Q perpendicular to Qn, i.e., Qj^ = Q — Q||. 



Define the Lyapunov functions Vj| (Q) = ||Q|||| — xr 

'EC* 



and Wj_{Q) = ||Q±| 



M 



A. Lower Bound 

Consider an arrival process with arrival rate A^ such that 
e = Mfi — X 1 -^. Let q^(i) denote the corresponding queue 
length vector. Since the system is stabilizable, there exists 
a steady-state distribution of q"(t), Again, lower bounding 
(^q*- 6 - 1 ) by a single queue length as in Section ITlI-AI we have 



a(t) for to = to and m! = ml 
otherwise 



Thus, {a m ,m'{t)} can be thought of as a set of correlated 
arrival processes. They are correlated so that only one of them 
can have a non-zero value at each time. Let X m ,m' = E[a m , m ' (t)] 
Then \ m ,m' — tt^- The arrivals in a m , ,„'(£) can be routed 
only to either server m or server mf. According to the power- 
of-two-choices algorithm, all the jobs are then routed to the 
server with smallest queue among m and to'. Ties are broken 
at random. Let a m (t) denote the arrivals to server m at time 
t after routing. 

Let /J, be the amount of service available in each time slot 
at each server. Not all of this service may be used either 
because the queue is empty or because different chunks of 
same job cannot be served simultaneously. Let s m (t) be the 
actual amount of service scheduled available in time slot t 
at server to. Let u rn (t) denote the unused service which is 
defined as u m (t) = fj, — s m (t). Let q m (t) denote the queue 
length at server m at time t, and let q(i) denote the vector 
(qi(t),q2{t), ....q M (t)) Then, we have 



q m (t + 1) = q m {t) + a m (t) - (i + u m (t). 



Note that 



u m {t) = whenever q m (t) + a m (t) > D max fi, 



(38) 



We again follow the procedure used in the previous section 
to show heavy traffic optimality. Since power-of-two-choices 
algorithm tries to equalize any two randomly chosen queues, 
we expect that there is a state-space collapse along the direc- 
tion where all queues are equal, similar to JSQ algorithm. 

Let ci = -^=(1, 1, 1) be the unit vector in R M along 

which we expect state-space collapse. Let 1 denote the vector 
(1,1, 1). For any Q G R M , define Qn to be the component 
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where B\ = Ms ™ ax . Thus, in the heavy-traffic limit we have 
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(O 
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(39) 



B. State Space Collapse 

For simplicity of notation, in this sub-section, we write q 
for q( e ). We will bound the drift of the Lyapunov function 
'Wi(Q), and again use Lemma|2]to obtain state space collapse. 
We again use ([8]) with C! instead of to get the drift of 
Wi fc) (q) in terms of drifts of V(q) and V,[ fc) (q). 

Let us first consider the last term. 

E[A V[|(q)|q(t)=q] 
=E [V[|(q(t + 1))-V[|(q(t))|q(t)=q] 

1_ E^( < + 1 )) -(E^(*)) l c lW = c l 

m / \ m / 
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m m / 

- (E*™w) |q(*) = q 
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y^Qmjt) +a m (*)- M + yu m (t) J 

m / \ m / 

2 y^gm(t) + a rn (t) - m y^u m {t) 

\ m / \ m / 

(y^qmit^j |q(*)=q 

^2a m (t) - m) +21 ^q m {t) I I ^2a m (t) - 

m / \ m / \ m 



-2iW> ^«m(*)^ |q(*) = q 




^o ro (i) - jti |q(t) = q 



(40) 



where K3 — 2Mp? is obtained by bounding s m (t) and 

limit) by Sjyiax- 

Now, we will bound the first term in ((8). Expanding 
[A V(q)|q(i)] and using d3H) , it is easy to see that 



E[A V(q)|q(t)=q] 
<if4-2/i^g ro (t) 



E X E 



^2g m (t)a m (i)|q(t) = q, Jf(t) = i, 



where K 4 = M{2p?{D max + 1) + er 2 + A 2 ). Let p be a per- 
mutation of (1,2, ...M) so that q plVj < q p(2 ) < < q p ( M )- 

Let p' be the inverse permutation. In other words, p'(m) is 
the position of m in the permutation p. Let q m in = an d 
q mm = %(M)- Then, we have 

E[A F(q)|q(t)=q] 

<K 4 - 2fj^2q m (t) + 2q 
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(>,i)^(p(i),?(M)) 



E[ 9i (t)a(t) + gi(i)o(t)|jr(t)=*,j] 



=/\~4 — 2fi^^q m (t) — M [q ma x — Qmin) + T7^^1m (t) ■ 
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Note that 



M C 2 
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max Hrmn } 
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Af 



max Qrnin ) 
nmx Qmin ) 



Thus, we have, 



[A V(q)|q(t) - q] <^4 - 2-$> m (t) ^ 



^ llqil 



Substituting this and ( |40b in ((H), we have 

X 3 + ^4 A 1 



E [A Wj_(q)] < 



q± 



M 



C 2 2-JM 



This means that we have negative drift for sufficiently large 
Wj_(q). Since the drift of H / j_(q) is finite with probability 1, 



using Lemma |2 



that 



there exist finite constants {N' r } r =i.2, 
< N' r for each r = 1,2, .... 



such 



C. Upper Bound 

The upper bound is again obtained by bounding each of 
the terms in ((24). This is identical to the case of JSQ routing 
(Proposition 3 in (|HJ). So, we will not repeat the proof here, 
but just state the upper bound. 
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(e) _ 
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> 



(0 
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where B,^ 
limit, we have 
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lim inf el 
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Thus, in heavy traffic 



> 



This coincides with the heavy-traffic lower bound in ( |39b . This 
establishes the first-moment heavy-traffic optimality of power- 
of-two choices routing algorithm. 

V. Conclusions 

We considered a stochastic model for load balancing and 
scheduling in cloud computing clusters. We studied the per- 
formance of JSQ routing and MaxWeight scheduling policy 
under this model. It was known that this policy is throughput 
optimal. We have shown that it is heavy traffic optimal when 
all the servers are identical. We also found that using the 
power-of-two-choices routing instead of JSQ routing is also 
heavy traffic optimal. 

We then considered a simpler setting where the jobs are 
of the same type, so only load balancing is needed. It has 
been established by others using diffusion limit arguments that 
the power-of-two-choices algorithm is heavy traffic optimal. 
We presented a steady-state version of this result here using 
Lyapunov drift arguments. 
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